The UK consists of four countries: England, Scotland, Wales and Norther Ireland (in order of population size). Great Britain refers to the island that has the first three countries. Administrative data for the UK often follows country divides, although England and Wales are accounted for together for many things by the Office of National Statistics. Administrative geographies are one of 8 overarching types of statistical geographies, as seen below:
Broadly, administrative geographies are most general purpose for the UK as this is the level at which policy is made. Also, the units of the Statistical Building Blocks constrained are by these. For example, middle-level output areas (MSOAs) and the smaller geographies below all aggregate to Local Authority Districts. Administrative geographies come in several levels of detail:
Worth noting about the Statistical Building Blocks is that they are derived from populations counts, not areas. Below is an overview of the thresholds used to create these geographies.
More about these population-weighted geographies here
You can get geographic data for the UK from the open geography portal via an API call. This is convenient because it means you don’t have to store large files on your machine and can share your work easier. For simplicity we will use regions for example below. There are nine regions in England.
library(raster)
library(knitr)
library(geojsonio)
library(sp)
library(tmap)
library(spdep)
library(reshape2)
library(rsq)
#connect to the open geography portal API
regions_json <- geojson_read("https://opendata.arcgis.com/datasets/8d3a9e6e7bd445e2bdcc26cdf007eac7_1.geojson", what = "sp")
z_regions_json <- regions_json
plot(regions_json )
I’m using table WU02EW - Location of usual residence and place of work by age. This data is available down to MSOA, but we will be using it at regional level for England.
#Connect to the NOMIS API to get Data
#Note: I'm only using England here for convience.
t_wu02Ew <- read.csv(file = "https://www.nomisweb.co.uk/api/v01/dataset/NM_1206_1.data.csv?date=latest&usual_residence=2013265921...2013265930&place_of_work=2013265921...2013265930&age=0...6&measures=20100", header=TRUE)
kable(head(t_wu02Ew))
| DATE | DATE_NAME | DATE_CODE | DATE_TYPE | DATE_TYPECODE | DATE_SORTORDER | USUAL_RESIDENCE | USUAL_RESIDENCE_NAME | USUAL_RESIDENCE_CODE | USUAL_RESIDENCE_TYPE | USUAL_RESIDENCE_TYPECODE | USUAL_RESIDENCE_SORTORDER | PLACE_OF_WORK | PLACE_OF_WORK_NAME | PLACE_OF_WORK_CODE | PLACE_OF_WORK_TYPE | PLACE_OF_WORK_TYPECODE | PLACE_OF_WORK_SORTORDER | AGE | AGE_NAME | AGE_CODE | AGE_TYPE | AGE_TYPECODE | AGE_SORTORDER | MEASURES | MEASURES_NAME | OBS_VALUE | OBS_STATUS | OBS_STATUS_NAME | OBS_CONF | OBS_CONF_NAME | URN | RECORD_OFFSET | RECORD_COUNT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 0 | Aged 16 and over | 0 | Age | 1000 | 0 | 20100 | Value | 936525 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d0d20100 | 0 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 1 | Aged 16-24 | 1 | Age | 1000 | 1 | 20100 | Value | 130827 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d1d20100 | 1 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2 | Aged 25-34 | 2 | Age | 1000 | 2 | 20100 | Value | 199584 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d2d20100 | 2 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 3 | Aged 35-49 | 3 | Age | 1000 | 3 | 20100 | Value | 344176 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d3d20100 | 3 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 4 | Aged 50-64 | 4 | Age | 1000 | 4 | 20100 | Value | 243426 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d4d20100 | 4 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 5 | Aged 65-74 | 5 | Age | 1000 | 5 | 20100 | Value | 15701 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d5d20100 | 5 | 700 |
This dataset is currently a flat 2D table despite containing multi-way tabulated counts: by region,origin,destination and age group. To wrangle this into counts by geographic unit to use with the polygon data, we apply the following transformation:
#names(t_wu02Ew )#Select only columns we need for now
ac <- c("USUAL_RESIDENCE_CODE","AGE_NAME","OBS_VALUE")
t_wu02Ew_ac <-t_wu02Ew[,ac]
reg_var <- xtabs(OBS_VALUE ~ USUAL_RESIDENCE_CODE + AGE_NAME, data=t_wu02Ew_ac)
reg_var <- as.data.frame.matrix(reg_var)
kable(reg_var)
| Aged 16-24 | Aged 16 and over | Aged 25-34 | Aged 35-49 | Aged 50-64 | Aged 65-74 | Aged 75+ | |
|---|---|---|---|---|---|---|---|
| E12000001 | 137047 | 974625 | 208221 | 358197 | 252002 | 16252 | 2906 |
| E12000002 | 386853 | 2705931 | 598669 | 986277 | 666548 | 57777 | 9807 |
| E12000003 | 294930 | 2029907 | 442170 | 739320 | 505157 | 41209 | 7121 |
| E12000004 | 247828 | 1777612 | 372762 | 656200 | 454732 | 39633 | 6457 |
| E12000005 | 290056 | 2106075 | 458558 | 771101 | 526138 | 51665 | 8557 |
| E12000006 | 315871 | 2299955 | 496908 | 834637 | 581497 | 60791 | 10251 |
| E12000007 | 374191 | 3197606 | 1056357 | 1101460 | 591100 | 60959 | 13539 |
| E12000008 | 461335 | 3391170 | 731448 | 1238930 | 852032 | 91961 | 15464 |
| E12000009 | 292428 | 2024395 | 417547 | 716798 | 531510 | 56730 | 9382 |
| W92000004 | 160618 | 1117784 | 238664 | 404170 | 284858 | 25071 | 4403 |
This data gives information on the working population by place of residence by age group. For simplicity, we will combine the seven age groups into two. Let’s assume person over 65 could be retired, whilst all other ages we can expect to working to generate two variables:
retired <- c("Aged 65-74", "Aged 75+" )
reg_var$w_Age <- rowSums(reg_var[,!names(reg_var) %in% retired])
reg_var$r_Age <- rowSums(reg_var[,retired])
Although you could join your tables on region names, geography codes where available will give you a much cleaner merge.
#attach the data to the dataframe component of the Spatial data
#Note: 0 means attach by row names
regions_json@data <- merge(regions_json@data, reg_var, by.x= "rgn15cd", by.y=0 )
A useful library for showing your geographic data is tmap. Dr. Robin Lovelace has a good primer on using this as you can see in the link above. Notice from the map below that we have more observations of individuals working above the age of 65 in the south and south-west.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(regions_json) + tm_polygons(col="r_Age")